IE 582 Homework 1 - FALL'24¶

Berat Kubilay Güngez - 2021402087

Table of Contents¶

  1. Introduction
  2. Related Literature
  3. Data Preprocessing and Analysis
    • 3a. Imports
    • 3b. Analysis of the Output
    • 3c. Analysis of the Input
  4. Comparison in Linear Models
  5. Conclusions
  6. Code

1. Introduction ¶

As high-frequency communication technologies like 5G evolve, designing efficient antennas has become crucial. Antenna performance, often evaluated by the S11 parameter, requires computationally intensive electromagnetic (EM) simulations, making traditional trial-and-error methods impractical. To address this, machine learning provides a data-driven approach to model and predict antenna characteristics based on design parameters. This assignment uses techniques like Principal Component Analysis (PCA) for dimensionality reduction and linear regression for predictive modeling to better understand and simplify the complex relationships within antenna design, aiming to improve efficiency in creating high-performance systems.

2. Related Literature ¶

In electrical network analysis, S-parameters (scattering parameters) describe how an electromagnetic signal interacts at different network ports, typically in high-frequency circuits. The S11 parameter, a specific type of S-parameter, measures the reflection at the input port—indicating how much signal is reflected back rather than transmitted. This reflection coefficient helps engineers assess antenna efficiency and impedance matching, critical for minimizing signal loss in RF designs. S-parameters like S11 are widely used for their practicality in evaluating performance across frequencies without needing complex internal device details.

For more, see Ansys on S-parameters.

3. Data Preprocessing and Analysis ¶

3a. Imports ¶

In [170]:
import pandas as pd # data processing
import numpy as np # mathematical operations
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
from sklearn.preprocessing import StandardScaler # data scaling
from sklearn.decomposition import PCA # dimensionality reduction
from sklearn.model_selection import train_test_split # data splitting
import statsmodels.api as sm # statistical modeling

3b. Analysis of the Output ¶

To comprehend the characteristics of S11 parameter data across various frequencies, the real and imaginary components of the data are imported. Then, the magnitude of the data is computed. Given that the magnitude of the S11 parameters encapsulates its performance, lower S11 parameter values indicate reduced signal reflection.

In [171]:
real_ouput_data_loc = "data/hw1_real.csv" 

real_output_df = pd.read_csv(real_ouput_data_loc)

img_output_data_loc = "data/hw1_img.csv" 

img_output_df = pd.read_csv(img_output_data_loc)

output_df = np.sqrt(real_output_df **2 + img_output_df **2) # Magnitude of the S11 parameter

output_df.head()
Out[171]:
0 1 2 3 4 5 6 7 8 9 ... 191 192 193 194 195 196 197 198 199 200
0 0.781778 0.783440 0.785795 0.788790 0.792359 0.796433 0.800936 0.805793 0.810930 0.816277 ... 0.988520 0.988551 0.988581 0.988610 0.988638 0.988664 0.988690 0.988714 0.988737 0.988759
1 0.986860 0.986669 0.986470 0.986263 0.986048 0.985824 0.985592 0.985350 0.985100 0.984839 ... 0.945061 0.945880 0.946669 0.947428 0.948158 0.948861 0.949537 0.950188 0.950814 0.951417
2 0.866883 0.865643 0.864258 0.862724 0.861039 0.859198 0.857199 0.855036 0.852705 0.850202 ... 0.865665 0.861328 0.856551 0.851268 0.845404 0.838870 0.831564 0.823368 0.814148 0.803752
3 0.995069 0.995055 0.995041 0.995025 0.995009 0.994991 0.994973 0.994953 0.994933 0.994912 ... 0.935682 0.932755 0.929637 0.926312 0.922765 0.918978 0.914932 0.910607 0.905982 0.901033
4 0.985009 0.985235 0.985447 0.985645 0.985831 0.986005 0.986169 0.986321 0.986464 0.986597 ... 0.988472 0.988418 0.988361 0.988304 0.988244 0.988183 0.988121 0.988057 0.987991 0.987924

5 rows × 201 columns

To begin, a single output is selected to analyze the distribution of the real part, imaginary part, and magnitude in comparison to one another. The third sample was chosen due to its characteristics.

In [172]:
plt.figure(figsize=(12, 6))

plt.plot(output_df.iloc[2], label="Magnitude")
plt.plot(real_output_df.iloc[2], label="Real part")
plt.plot(img_output_df.iloc[2], label="İmaginary part")

plt.title("S11 Parameter of the 3rd Sample")
plt.legend(loc="best", fontsize=10)

plt.xticks(range(0, 200, 25))
plt.xlabel("Frequency")
plt.show()
No description has been provided for this image

Then first 15 samples are plotted to see different characteristics.

In [173]:
plt.figure(figsize=(12, 6))

for i in range(0,15):
    plt.plot(output_df.iloc[i], label=f"output {i}")

plt.legend(loc="best", fontsize=6)

plt.title("Magnitude of the first 15 outputs")

plt.xticks(range(0, 200, 25))
plt.xlabel("Frequency")
plt.show()
No description has been provided for this image

As can be seen in the plot above, each sample exhibits distinct characteristics at various frequency values. Among these samples, output 2 appears to perform best around the 60th frequency value compared to others. Approximately six outputs reflect nearly all of the signal back meaning that they do not perform well.

To apply predictive approaches later, data needs to be simplified in a way that there will be only one output value that captures the variability of all frequencies. One option seems to be using minimum values. To make it more applicable, the average of the minimum 25 values is calculated.

In [174]:
min_values_output = output_df.apply(lambda x: x.nsmallest(25).mean(), axis=1)
min_values_index = output_df.idxmin(axis=1)

min_values_real = real_output_df.apply(lambda x: x.nsmallest(25).mean(), axis=1)
min_values_img = img_output_df.apply(lambda x: x.nsmallest(25).mean(), axis=1)

The plot below illustrates the output values that will be generated in the event of utilizing the average of the 25 minimum values.

In [175]:
plt.figure(figsize=(12, 6))

for i in range(0,10):
    plt.plot(output_df.iloc[i], label=f"output {i}")
    plt.scatter(min_values_index[i], min_values_output[i], color="red")

plt.legend(loc="best", fontsize=6)
plt.title("Magnitued of output 0 to 9 with their avg. minimum values")
plt.xticks(range(0, 200, 25))
plt.xlabel("Frequency")
plt.show()
No description has been provided for this image

One other option is applying Principal Component Analysis(PCA) to the output data. PCA is a dimensionality reduction technique that transforms data into a lower-dimensional space while preserving the variance of the data. By applying PCA to the output data, the number of output variables can be reduced to a smaller set of principal components that capture the most significant variance in the data.

In [176]:
pca = PCA(n_components=1) # Only one component is selected for simplicity

principal_components = pca.fit_transform(output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[176]:
Standard deviation Proportion of Variance
Comp.1 2.400357 0.610351

Results of Principal Component Analysis (PCA) indicate that 61% of the variability in the magnitude of S11 parameters can be captured by a single output. This finding is advantageous as variability is a crucial factor that is learned during the training process of Machine Learning models.

When all of the principal components are considered, it can be seen that only 3 components are enough to capture 85% of the variability in the data.

In [177]:
pca = PCA() # No limit on the number of components

principal_components = pca.fit_transform(output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
    'Cumulative Proportion': np.cumsum(pca.explained_variance_ratio_)
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[177]:
Standard deviation Proportion of Variance Cumulative Proportion
Comp.1 2.400357e+00 6.103513e-01 0.610351
Comp.2 1.119120e+00 1.326727e-01 0.743024
Comp.3 1.064786e+00 1.201026e-01 0.863127
Comp.4 7.307837e-01 5.657257e-02 0.919699
Comp.5 4.739296e-01 2.379336e-02 0.943493
... ... ... ...
Comp.197 4.019508e-14 1.711489e-28 1.000000
Comp.198 1.117213e-14 1.322210e-29 1.000000
Comp.199 5.747085e-15 3.498835e-30 1.000000
Comp.200 2.505320e-15 6.648975e-31 1.000000
Comp.201 1.065973e-15 1.203706e-31 1.000000

201 rows × 3 columns

Same technique is applied to the real and imaginary parts of the data as well.

In [178]:
pca = PCA(n_components=1) # Only one component is selected for simplicity

principal_components_real = pca.fit_transform(real_output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[178]:
Standard deviation Proportion of Variance
Comp.1 8.817202 0.855197

PCA on the real part of the data is performed even better. Suggests a single value can capture 85% of the variability. And the table below shows that only 5 components are enough to capture almost 99% of the variability.

In [179]:
pca = PCA() # No limit on the number of components

principal_components = pca.fit_transform(real_output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
    'Cumulative Proportion': np.cumsum(pca.explained_variance_ratio_)
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[179]:
Standard deviation Proportion of Variance Cumulative Proportion
Comp.1 8.817202e+00 8.551972e-01 0.855197
Comp.2 2.161807e+00 5.140891e-02 0.906606
Comp.3 1.999371e+00 4.397353e-02 0.950580
Comp.4 1.486475e+00 2.430636e-02 0.974886
Comp.5 9.854447e-01 1.068241e-02 0.985568
... ... ... ...
Comp.197 8.123715e-16 7.259622e-33 1.000000
Comp.198 8.123715e-16 7.259622e-33 1.000000
Comp.199 8.123715e-16 7.259622e-33 1.000000
Comp.200 4.924351e-16 2.667489e-33 1.000000
Comp.201 3.381690e-16 1.257976e-33 1.000000

201 rows × 3 columns

In [180]:
pca = PCA(n_components=1) # Only one component is selected for simplicity

principal_components_img = pca.fit_transform(img_output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[180]:
Standard deviation Proportion of Variance
Comp.1 3.135929 0.419402

PCA on the imaginary part of the data performs worse compared to other data sets. This suggests that a single value can only capture 49% of the variability. However, this approach can still be effective in certain applications, depending on the specific use case.

The table below shows that only 5 components are enough to capture almost 95% of the variability. For simplicity only the 1st component is used in the following analysis but using more components can give better results.

In [181]:
pca = PCA() # No limit on the number of components

principal_components = pca.fit_transform(img_output_df) # Transform the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
    'Cumulative Proportion': np.cumsum(pca.explained_variance_ratio_)
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[181]:
Standard deviation Proportion of Variance Cumulative Proportion
Comp.1 3.135929e+00 4.194022e-01 0.419402
Comp.2 2.299117e+00 2.254344e-01 0.644837
Comp.3 1.941158e+00 1.607015e-01 0.805538
Comp.4 1.555338e+00 1.031687e-01 0.908707
Comp.5 9.440560e-01 3.800964e-02 0.946716
... ... ... ...
Comp.197 2.103890e-16 1.887750e-33 1.000000
Comp.198 2.086132e-16 1.856015e-33 1.000000
Comp.199 1.914544e-16 1.563251e-33 1.000000
Comp.200 1.734331e-16 1.282809e-33 1.000000
Comp.201 1.140238e-16 5.544844e-34 1.000000

201 rows × 3 columns

3c. Analysis of the Input ¶

In [182]:
input_data_loc = "data/hw1_input.csv"

input_df = pd.read_csv(input_data_loc)

input_df.head()
Out[182]:
length of patch width of patch height of patch height of substrate height of solder resist layer radius of the probe c_pad c_antipad c_probe dielectric constant of substrate dielectric constant of solder resist layer
0 3.202024 2.288742 0.012514 0.139247 0.041757 0.028566 0.000549 0.032403 0.348140 3.735926 4.278575
1 2.107848 2.895504 0.037171 0.149492 0.056775 0.028930 0.005536 0.053647 0.326369 4.929862 4.876068
2 3.252113 4.818411 0.025432 0.578834 0.029972 0.030922 0.020274 0.049845 0.446639 4.772670 4.745106
3 4.161509 2.294309 0.011058 0.117266 0.093223 0.017604 0.001135 0.098610 0.055665 4.102438 3.755671
4 4.820912 2.948325 0.019658 0.163503 0.094337 0.025757 0.021725 0.072813 0.272282 2.531031 3.047553

Input values are composed of several geometric features. As can be observed in the accompanying table, these features have distinct ranges. Consequently, it is necessary to scale the data to prevent any potential effects. This scaling process is performed after an in-depth examination of the data as it is.

In [183]:
input_df.describe() # Descriptive statistics of the input data
Out[183]:
length of patch width of patch height of patch height of substrate height of solder resist layer radius of the probe c_pad c_antipad c_probe dielectric constant of substrate dielectric constant of solder resist layer
count 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000 385.000000
mean 3.569210 3.536983 0.024273 0.347643 0.060065 0.032198 0.012797 0.060648 0.245586 3.704384 3.521911
std 0.966173 1.182100 0.008800 0.272738 0.023670 0.010352 0.007111 0.021503 0.111245 0.853877 0.871233
min 1.805658 1.801273 0.010008 0.100321 0.020039 0.015012 0.000003 0.025292 0.050810 2.023380 2.001679
25% 2.755534 2.501163 0.016194 0.126901 0.038689 0.023389 0.006985 0.042011 0.148565 2.998152 2.783710
50% 3.637716 3.215396 0.024198 0.155254 0.060764 0.030979 0.012454 0.060532 0.245049 3.866295 3.480916
75% 4.369311 4.829731 0.031688 0.649324 0.080247 0.041819 0.019014 0.078227 0.340203 4.375551 4.278575
max 5.199919 5.198689 0.039843 0.799082 0.099728 0.049960 0.024996 0.099945 0.449599 4.999324 4.999950

Besides the relation between height of substrate and width of patch, there doesn’t seem to be a clear relationship between the input features.

In [184]:
input_df.corr() # Correlation matrix
Out[184]:
length of patch width of patch height of patch height of substrate height of solder resist layer radius of the probe c_pad c_antipad c_probe dielectric constant of substrate dielectric constant of solder resist layer
length of patch 1.000000 -0.114174 -0.026032 -0.064344 0.044502 -0.069905 -0.005560 -0.009627 0.081735 -0.037448 -0.014496
width of patch -0.114174 1.000000 0.091726 0.923739 -0.021056 0.035675 -0.044198 -0.013358 0.029999 0.442193 0.038746
height of patch -0.026032 0.091726 1.000000 0.082833 0.036045 0.030689 0.002422 0.037583 0.031592 -0.044318 -0.037769
height of substrate -0.064344 0.923739 0.082833 1.000000 -0.011870 0.017410 -0.031571 -0.009432 0.044166 0.459847 0.002421
height of solder resist layer 0.044502 -0.021056 0.036045 -0.011870 1.000000 -0.012756 -0.018044 -0.017071 0.024842 -0.010967 -0.038405
radius of the probe -0.069905 0.035675 0.030689 0.017410 -0.012756 1.000000 0.002906 -0.000523 0.048088 -0.025728 0.007524
c_pad -0.005560 -0.044198 0.002422 -0.031571 -0.018044 0.002906 1.000000 0.067678 -0.015500 -0.049318 0.015284
c_antipad -0.009627 -0.013358 0.037583 -0.009432 -0.017071 -0.000523 0.067678 1.000000 -0.132321 -0.016741 0.067821
c_probe 0.081735 0.029999 0.031592 0.044166 0.024842 0.048088 -0.015500 -0.132321 1.000000 0.015640 -0.029101
dielectric constant of substrate -0.037448 0.442193 -0.044318 0.459847 -0.010967 -0.025728 -0.049318 -0.016741 0.015640 1.000000 0.060402
dielectric constant of solder resist layer -0.014496 0.038746 -0.037769 0.002421 -0.038405 0.007524 0.015284 0.067821 -0.029101 0.060402 1.000000

Again, besides the height of substrate and the width of patch, there doesn’t seem to be any interesting relation between features.

In [185]:
pair_plot = sns.pairplot(input_df, kind='scatter', diag_kind='kde', markers='o', plot_kws={'alpha':0.5}) # Pair plot of the input data

plt.show()
No description has been provided for this image

Now, let us see the relation between the height of substrate and the width of the patch more closely. As can be seen below, these two are scattered in two parts.

In [186]:
plt.figure(figsize=(12, 6))

plt.scatter(input_df["height of substrate"], input_df["width of patch"])

plt.title("Width of Patch vs Height of Substrate")
plt.xlabel("Height of Substrate")
plt.ylabel("Width of Patch")
plt.show()
No description has been provided for this image

To reduce the dimensions of the space, variables can be combined as follows.

In [187]:
input_df["width of patch combined with height of substrate"] = np.where((input_df["width of patch"] > 4) & (input_df["height of substrate"] > 0.4), 1, 0)

input_df.drop(["width of patch", "height of substrate"], axis=1, inplace=True)

Plot below suggests that the combined relation of these two variables may help us to capture the dynamics of the minimum magnitude. Therefore, this manipulation will be used in the following models.

In [188]:
color = {0: "red", 1: "blue"} # Color mapping for the scatter plot to show the combined feature

plt.figure(figsize=(10, 6))
plt.scatter(range(len(min_values_index)), min_values_output, 
            c=input_df["width of patch combined with height of substrate"].map(color)) 

plt.title("Minimum Magnitudes with respect to width of patch combined with height of substrate")
plt.ylabel("Minimum Magnitudes")
plt.xticks([])
plt.show()
No description has been provided for this image

Data needs to be scaled as previously discussed. To reduce the effects of outliers(if they exist), standardization is applied.

In [189]:
scaler = StandardScaler() # Standardize the data

input_df_scaled = pd.DataFrame(scaler.fit_transform(input_df), columns=input_df.columns) # Transform the data

input_df_scaled.head()
Out[189]:
length of patch height of patch height of solder resist layer radius of the probe c_pad c_antipad c_probe dielectric constant of substrate dielectric constant of solder resist layer width of patch combined with height of substrate
0 -0.380536 -1.338036 -0.774472 -0.351267 -1.724662 -1.315281 0.923074 0.036988 0.869628 -0.798935
1 -1.514495 1.467538 -0.139201 -0.316141 -1.022443 -0.326035 0.727110 1.437061 1.556322 -0.798935
2 -0.328626 0.131827 -1.272998 -0.123460 1.052887 -0.503034 1.809644 1.252729 1.405808 1.251666
3 0.613834 -1.503664 1.402635 -1.411656 -1.642198 1.767754 -1.709451 0.466780 0.268659 -0.798935
4 1.297212 -0.525081 1.449735 -0.622998 1.257188 0.566491 0.240287 -1.375936 -0.545176 -0.798935

Correlation values between features and the average 25 minimum values of magnitude values are calculated below. It suggests the previously manipulated combined feature has a high correlation and may be effective in linear models.

In [190]:
input_df_scaled.corrwith(min_values_output).sort_values(ascending=False)
Out[190]:
length of patch                                     0.149842
c_pad                                               0.069841
c_antipad                                           0.030746
height of solder resist layer                       0.013917
radius of the probe                                -0.007607
c_probe                                            -0.019620
height of patch                                    -0.063299
dielectric constant of solder resist layer         -0.075817
dielectric constant of substrate                   -0.481474
width of patch combined with height of substrate   -0.886938
dtype: float64

Now, let’s apply the PCA method to further simplify the input data set.

In [191]:
pca_df = input_df_scaled.copy()

pca_df.drop("width of patch combined with height of substrate", axis=1, inplace=True) # Dropping the categorical feature

Categorical feature is dropped because one of the PCA's assumptions is that the data is numeric and is distributed normally.

In [192]:
pca = PCA() # Do not limit the number of components to see the explained variance of all components

principal_components = pca.fit(pca_df) # Fit the data

explained_variance_df = pd.DataFrame({
    'Standard deviation': np.sqrt(pca.explained_variance_),
    'Proportion of Variance': pca.explained_variance_ratio_,
    'Cumulative Proportion': np.cumsum(pca.explained_variance_ratio_)
}, index=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))])

explained_variance_df
Out[192]:
Standard deviation Proportion of Variance Cumulative Proportion
Comp.1 1.104823 0.135274 0.135274
Comp.2 1.059588 0.124423 0.259697
Comp.3 1.042818 0.120516 0.380213
Comp.4 1.009667 0.112976 0.493189
Comp.5 0.999758 0.110769 0.603958
Comp.6 0.972026 0.104709 0.708667
Comp.7 0.969715 0.104212 0.812878
Comp.8 0.940186 0.097962 0.910840
Comp.9 0.896956 0.089160 1.000000

Table above suggests that there doesn’t seem to be a clear winner that captures most of the variability itself. Since most of the components have a similar proportion of the variance.

See table below to understand and compare the content of components

In [193]:
loadings_df = pd.DataFrame(pca.components_.T, columns=[f'Comp.{i+1}' for i in range(len(pca.explained_variance_))], index=pca_df.columns)

loadings_df
Out[193]:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
length of patch 0.319646 -0.025815 -0.595711 0.292983 0.233550 -0.211242 -0.288572 -0.323017 -0.411906
height of patch 0.059382 -0.514137 0.230340 -0.291507 0.372631 -0.540260 0.266577 0.139445 -0.269928
height of solder resist layer 0.269357 -0.215348 -0.217159 -0.430974 0.431274 0.667127 -0.018495 0.118387 0.052558
radius of the probe 0.004249 -0.195689 0.662581 0.255579 0.157651 0.285254 -0.400929 -0.346496 -0.265191
c_pad -0.259423 -0.330291 -0.146192 0.547369 0.014845 0.323602 0.607875 0.005726 -0.167748
c_antipad -0.556287 -0.245124 -0.192633 -0.074022 0.291205 -0.115210 -0.152863 -0.461867 0.503101
c_probe 0.562431 0.034002 0.171857 0.401044 0.294562 -0.131520 0.167223 -0.014750 0.599916
dielectric constant of substrate -0.038179 0.604910 0.133207 -0.206798 0.326084 0.019769 0.462814 -0.461898 -0.194545
dielectric constant of solder resist layer -0.356633 0.337754 0.011891 0.263499 0.563588 -0.037246 -0.222571 0.561443 -0.073395

Plot below shows the variance explained by each component. 10% of the variance may be a good threshold to decide on the number of components to be used.

In [194]:
explained_variance = explained_variance_df["Proportion of Variance"]

plt.figure(figsize=(8, 6))
plt.plot(range(1, len(explained_variance) + 1), explained_variance, marker='o', linestyle='--', color='b')
plt.axhline(y=0.1, color='r', linestyle='--') # 10% explained variance threshold

plt.title("Explained Variance of Principal Components")
plt.xlabel('Principal Components')
plt.ylabel('Variance Explained')
plt.xticks(range(1, len(explained_variance) + 1))
plt.grid()
plt.show()
No description has been provided for this image

Plot below shows the cumulative variance explained by the components. It suggests that around 80% of the variance can be explained by 7 components.

In [195]:
cumulative_variance = explained_variance_df["Cumulative Proportion"]

plt.figure(figsize=(8, 6))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o', linestyle='--', color='b')
plt.axhline(y=0.8, color='r', linestyle='--')  # 80% cumulative explained variance threshold

plt.title('Cumulative Explained Variance of Principal Components')
plt.xlabel('Principal Components')
plt.ylabel('Variance Explained')
plt.xticks(range(1, len(cumulative_variance) + 1))
plt.grid()
plt.show()
No description has been provided for this image

Finally, the number of components is decided as 7. Then the data is transformed into the new space with the addition of the categorical feature.

In [196]:
transformed_input_df = pd.DataFrame(pca.transform(pca_df), columns=[f'PC{i+1}' for i in range(len(pca.explained_variance_))]) # Transform the data

transformed_input_df.drop(columns=["PC9", "PC8"], inplace=True, axis=1) # Dropping the last two components

# Adding the categorical feature
transformed_input_df["width of patch combined with height of substrate"] = input_df_scaled["width of patch combined with height of substrate"] 

transformed_input_df.head()
Out[196]:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 width of patch combined with height of substrate
0 0.975511 2.172806 0.533331 0.267580 -0.611399 -0.373230 -0.961117 -0.798935
1 -0.190132 1.213715 1.608164 -1.023334 1.533051 -1.074461 0.826188 -0.798935
2 0.034671 1.308687 0.858340 1.833032 1.006886 -0.753197 1.489438 1.251666
3 -1.153707 1.155495 -2.280813 -2.088161 0.255904 0.704724 -1.436043 -0.798935
4 0.512146 -1.516188 -2.062709 0.632568 0.132800 1.102164 -0.089013 -0.798935

4. Comparison in Linear Models ¶

In the following section, error metrics are used to compare different feature sets and outputs. This method was chosen because examining the differences in these error metrics provides valuable insight into the performance of the models.

In [197]:
# Error metrics
def accu(actual, forecast):
    error = actual - forecast
    mean = np.mean(actual)
    sd = np.std(actual)
    CV = sd / mean # Coefficient of Variation
    FBias = np.sum(error) / np.sum(actual) # Forecast Bias
    MAPE = np.sum(np.abs(error / (actual+1))) / len(actual) # Mean Absolute Percentage Error
    MAD = np.sum(np.abs(error)) / len(actual) # Mean Absolute Deviation
    MADP = np.sum(np.abs(error)) / np.sum(np.abs(actual)) # Mean Absolute Deviation Percentage
    WMAPE = MAD / mean # Weighted Mean Absolute Percentage Error
    
    # Formatting the results
    result_dict = {
        "n": len(actual),
        "mean": mean,
        "sd": sd,
        "CV": CV,
        "FBias": FBias,
        "MAPE": MAPE,
        "MAD": MAD,
        "MADP": MADP,
        "WMAPE": WMAPE
    }
    
    return pd.DataFrame(result_dict, index=[0])

Real part of the output is partitioned into training and testing sets to assess the performance of linear models. The training set is utilized for model training, while the testing set is employed to evaluate the model’s performance on unseen data. The training set comprises 80% of the data, and the testing set comprises the remaining 20%. It is important to note that a predetermined random state is employed to make comparison between the sets easier.

In [198]:
# Splitting the data into train and test sets

X_train_transformed, X_test_transformed, y_train_min_real, y_test_min_real = train_test_split(transformed_input_df, min_values_real, test_size=0.2, random_state=5)
X_train, X_test, y_train_pca, y_test_pca = train_test_split(input_df_scaled, principal_components_real, test_size=0.2, random_state=5)

# Adding the constant term to the input data

X_train_transformed = sm.add_constant(X_train_transformed)
X_train = sm.add_constant(X_train)

# Models
model_1 = sm.OLS(y_train_min_real, X_train).fit()
model_2 = sm.OLS(y_train_min_real, X_train_transformed).fit()
model_3 = sm.OLS(y_train_pca, X_train).fit()
model_4 = sm.OLS(y_train_pca, X_train_transformed).fit()

# Error metrics for the models
results = {
    "Min Real Values + Original Input": accu(y_train_min_real, model_1.predict(X_train)),
    "Min Real Values + PCA Components": accu(y_train_min_real, model_2.predict(X_train_transformed)),
    "PCA Real Values + Original Input": accu(y_train_pca.flatten(), model_3.predict(X_train)),
    "PCA Real Values + PCA Components": accu(y_train_pca.flatten(), model_4.predict(X_train_transformed))
}

for result in results:
    print(f"Error Metrics for {result}:")
    display(results[result])
Error Metrics for Min Real Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.560067 0.479093 -0.855421 -3.572008e-16 0.932317 0.099102 0.155397 -0.176946
Error Metrics for Min Real Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.560067 0.479093 -0.855421 -2.786809e-16 0.971439 0.099467 0.155969 -0.177598
Error Metrics for PCA Real Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.011962 8.799187 -735.612751 1.436820e-13 0.411477 1.555944 0.186104 -130.077007
Error Metrics for PCA Real Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.011962 8.799187 -735.612751 1.880402e-14 0.434649 1.576921 0.188613 -131.830723

First of all, the training set of real part is used to train and compare linear models. PCA applied outputs have a lot of variance compared to the original output. This suggests that the PCA method may be more effective in capturing the dynamics of the data. The error metrics show that original output performs a bit better compared to PCA applied outputs. However, the difference is not significant. In MAPE metric, PCA applied outputs perform better.

Then, the test set of real part is used to train and compare linear models. The results are similar to the training set. The original output performs a bit better compared to PCA applied outputs. However, the difference is not significant.

In [199]:
X_test = sm.add_constant(X_test)
X_test_transformed = sm.add_constant(X_test_transformed)

# Making predictions on the test data
y_pred_1 = model_1.predict(X_test)
y_pred_2 = model_2.predict(X_test_transformed)
y_pred_3 = model_3.predict(X_test)
y_pred_4 = model_4.predict(X_test_transformed)

results = {
    "Min Real Values + Original Input": accu(y_test_min_real, y_pred_1),
    "Min Real Values + PCA Components": accu(y_test_min_real, y_pred_2),
    "PCA Real Values + Original Input": accu(y_test_pca.flatten(), y_pred_3),
    "PCA Real Values + PCA Components": accu(y_test_pca.flatten(), y_pred_4)
}

for result in results:
    print(f"Error Metrics for {result}:")
    display(results[result])
Error Metrics for Min Real Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 -0.543346 0.487735 -0.897651 0.013614 0.757313 0.094983 0.152352 -0.174811
Error Metrics for Min Real Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 -0.543346 0.487735 -0.897651 0.013971 0.764249 0.094574 0.151696 -0.174058
Error Metrics for PCA Real Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 0.047847 8.83176 184.583972 -4.061741 1.831829 1.572254 0.186343 32.860138
Error Metrics for PCA Real Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 0.047847 8.83176 184.583972 -3.759178 2.283109 1.515366 0.179601 31.671181

As evident from the table, manipulating the input data doesn’t seem to have a significant impact on the outcomes. However, observing that PCA-applied inputs perform similarly to the original outputs implies that we were able to achieve comparable results using fewer inputs.

Now, let's inspect the model 3 in more detail. The plot below shows the predicted values and the actual values.

In [200]:
plt.figure(figsize=(12, 6))

plt.scatter(y_train_pca, model_3.predict(X_train), c="red")

plt.plot(y_train_pca, y_train_pca, color="blue")

plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("Predicted vs Actual values")
plt.show()
No description has been provided for this image

Plot suggests that model performs well in predicting some values but in general it lacks some kind of information to capture the dynamics of the data. It is trival to see that real data is splitted into two parts and the model is able to capture this dynamics at some level but not perfectly. More sophisticated models may be needed to capture the dynamics of the data. One option may be splitting the data into two parts and applying different models to each part.

Now, let’s apply the same models to the imaginary part of the data.

In [201]:
# Splitting the data into train and test sets

X_train_transformed, X_test_transformed, y_train_min_real, y_test_min_real = train_test_split(transformed_input_df, min_values_img, test_size=0.2, random_state=16)
X_train, X_test, y_train_pca, y_test_pca = train_test_split(input_df_scaled, principal_components_img, test_size=0.2, random_state=16)

# Adding the constant term to the input data

X_train_transformed = sm.add_constant(X_train_transformed)
X_train = sm.add_constant(X_train)

# Models
model_1 = sm.OLS(y_train_min_real, X_train).fit()
model_2 = sm.OLS(y_train_min_real, X_train_transformed).fit()
model_3 = sm.OLS(y_train_pca, X_train).fit()
model_4 = sm.OLS(y_train_pca, X_train_transformed).fit()

# Error metrics for the models
results = {
    "Min Img. Values + Original Input": accu(y_train_min_real, model_1.predict(X_train)),
    "Min Img. Values + PCA Components": accu(y_train_min_real, model_2.predict(X_train_transformed)),
    "PCA Img. Values + Original Input": accu(y_train_pca.flatten(), model_3.predict(X_train)),
    "PCA Img. Values + PCA Components": accu(y_train_pca.flatten(), model_4.predict(X_train_transformed))
}

for result in results:
    print(f"Error Metrics for {result}:")
    display(results[result])
Error Metrics for Min Img. Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.105832 0.344569 -3.255805 1.335142e-15 0.326074 0.214162 0.684306 -2.023598
Error Metrics for Min Img. Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.105832 0.344569 -3.255805 5.177083e-16 0.344986 0.224031 0.715842 -2.116853
Error Metrics for PCA Img. Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.043142 3.053747 -70.783449 1.310099e-14 3.713698 1.447139 0.559968 -33.543547
Error Metrics for PCA Img. Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 308 -0.043142 3.053747 -70.783449 2.914303e-14 3.236567 1.499687 0.580301 -34.761567

Here, it can be seen that PCA applied outputs again has more variance meaning that PCA method may be more effective in capturing the dynamics of the data. The error metrics show that original output performs a bit better compared to PCA applied outputs.

Then, the test set of imaginary part is used to train and compare linear models. The results are similar to the training set. The original output performs a bit better compared to PCA applied outputs. The difference is more significant compared to the real part. This implies that the PCA applied output model can be improved by using more components, more sophisticated models or different approaches.

In [202]:
X_test = sm.add_constant(X_test)
X_test_transformed = sm.add_constant(X_test_transformed)

# Making predictions on the test data
y_pred_1 = model_1.predict(X_test)
y_pred_2 = model_2.predict(X_test_transformed)
y_pred_3 = model_3.predict(X_test)
y_pred_4 = model_4.predict(X_test_transformed)

results = {
    "Min Img. Values + Original Input": accu(y_test_min_real, y_pred_1),
    "Min Img. Values + PCA Components": accu(y_test_min_real, y_pred_2),
    "PCA Img. Values + Original Input": accu(y_test_pca.flatten(), y_pred_3),
    "PCA Img. Values + PCA Components": accu(y_test_pca.flatten(), y_pred_4)
}

for result in results:
    print(f"Error Metrics for {result}:")
    display(results[result])
Error Metrics for Min Img. Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 -0.125335 0.411095 -3.279959 0.39226 0.592393 0.274284 0.740183 -2.188402
Error Metrics for Min Img. Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 -0.125335 0.411095 -3.279959 0.485525 0.622196 0.279561 0.754424 -2.230505
Error Metrics for PCA Img. Values + Original Input:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 0.172568 3.421087 19.824523 -1.970121 1.296612 1.767288 0.592089 10.241086
Error Metrics for PCA Img. Values + PCA Components:
n mean sd CV FBias MAPE MAD MADP WMAPE
0 77 0.172568 3.421087 19.824523 -2.465339 1.238967 1.784526 0.597865 10.340977

Now, let's inspect the model 3 in more detail. The plot below shows the predicted values and the actual values. Plot shows that linear model can capture its distribution at some level but it lacks some kind of information to capture the dynamics of the data.

In [203]:
plt.figure(figsize=(12, 6))

plt.scatter(y_train_pca, model_3.predict(X_train), c="red")

plt.plot(y_train_pca, y_train_pca, color="blue")

plt.xlabel("Actual values")
plt.ylabel("Predicted values")
plt.title("Predicted vs Actual values")
plt.show()
No description has been provided for this image

Note that in all the applications above, the frequencies that provide the most information (highest variance) are inspected. These types of linear models do not capture relationships among the outputs themselves. To improve future studies, considering relationships between outputs is a promising option.

5. Conclusions ¶

In conclusion, this assignment aimed to investigate the connection between antenna geometry parameters and S11 parameter values. By employing Principal Component Analysis (PCA) on the output data, the number of output variables was reduced to a single principal component that collectively explained some part of the variance in the data. This approach proved advantageous as it simplified the data and enhanced the efficiency of the predictive models. The results of the linear models indicated that the PCA-derived outputs were more effective in capturing the variability in the data compared to the approach that utilized the minimum value. This finding suggests that PCA is a more sophisticated method capable of capturing the dynamics of the data more effectively. Nevertheless, the linear models were unable to fully capture the dynamics of the data, implying that more sophisticated machine learning techniques or more detailed manipulations in the input data may be necessary to enhance the predictive accuracy of the models.

6. Code ¶

Click here to access the code.


AI usage in this homework assignment:

  • Apple Intelligence is utilized to check the grammar of the text.
  • Throughout the code, GitHub Copilot is employed to enhance coding efficiency.